Resource considerations for process steps

Each step in a process reads document data from a data store, performs its operation, and writes the results to a new data store. This read-process-write cycle repeats at every step. The cumulative resource usage across all steps determines the overall memory, CPU, disk, and thread demands of a process execution.

Understanding how individual steps use system resources helps you design processes that perform efficiently and avoid bottlenecks, especially under high document volumes or on constrained runtime infrastructure.

General design considerations

The following considerations apply to all process steps regardless of type.

Document storage: All document data, including both the document content and its associated metadata, is maintained in data store objects on disk throughout execution. This means disk I/O performance directly affects process throughput. Slow storage infrastructure increases the time spent reading and writing between steps, which compounds across complex processes.
XML profile parsing: When parsing XML documents with instance identifiers defined, the entire DOM tree is loaded into memory. For large XML payloads, this has a significant memory impact. Profiles without instance identifiers use SAX-based (event-based) parsing instead, which is more memory-efficient and is the better choice for large documents.
Map function caching: Map function execution performance can be improved by enabling function caching. This is most effective when the same input values are used as function arguments multiple times within the same execution. Caching reduces redundant computation at the cost of some additional memory.

Resource impact by step type

Map step

The Map step parses source documents into an in-memory data graph for transformation.

Threads: No additional threads are created.
Heap: Memory usage increases in proportion to document size, driven by parsing the document into a data node graph. When Low Memory Mode is active, data nodes that exceed the cache threshold (com.boomi.container.transform.maxCacheSize, default 10,000) are written to disk instead of held in memory. XML profiles with instance identifiers load the entire DOM tree into heap, so large XML documents have a significant memory impact. XML profiles without instance identifiers use SAX parsing and do not have the same impact. Map function execution also uses additional memory; connector call functions are the most memory-intensive, while simple math and string operations have negligible impact.
CPU: No additional CPU usage beyond the map function execution itself.
Disk: When Low Memory Mode is active, data nodes exceeding the cache size threshold are stored on disk.

Business Rules step

The Business Rules step lets you work with the profile structure of a document in order to check multiple business rules to determine if a document should be accepted or rejected.

Threads: No additional threads are created.
Heap: Additional memory is used for parsing the document during rule evaluation.
CPU: No additional CPU usage beyond standard document parsing.
Disk: No additional disk usage.

Cleanse step

The Cleanse step enables you to validate document field values and either repair or reject the document before further processing.

Threads: No additional threads are created.
Heap: Additional memory is used for parsing the document.
CPU: No additional CPU usage beyond standard document parsing.
Disk: No additional disk usage.

Route step

The Route step evaluates a routing condition, which may involve extracting a value from the document or from a process or connector property.

Threads: No additional threads are created.
Heap: Additional memory is used for value extraction. The most memory-intensive extractions involve profile element parameters and external lookups.
CPU: Additional CPU is used for value extraction. Profile element parameters and external lookups are the most CPU-intensive.
Disk: No additional disk usage.

Add to Cache step

The Add to Cache step stores and retrieves cached documents using an index.

Threads: No additional threads are created.
Heap: No additional heap usage.
CPU: No additional CPU usage.
Disk: Cache document data is stored as data stores on the file share. Cache indexes are stored as Lucene indexes on the file share. Disk usage grows in proportion to the volume and size of cached documents.

Data Process step — Groovy and Java scripting

The Data Process step allows you to manipulate document data within a process. The resource impact described here applies to the initialization of the scripting engine, not to the execution of the script itself. Script execution resource usage is implementation-dependent.

Threads: An additional thread is created for the script execution.
Heap: Additional memory is used for loading the scripting engine classes and any additional classes referenced in the script. Groovy 2.4 has a lower memory footprint than Groovy 1.5 for equivalent scripts.
CPU: Potential additional CPU usage from script execution, depending on script complexity.
Disk: No additional disk usage.

Flow Control step

The Flow Control step splits processing across multiple parallel execution paths.

Threads: Multiple threads or multiple JVMs are created, depending on whether the process uses in-JVM or forked execution mode.
Heap: Spawning new JVMs is the most memory-intensive configuration. Each forked JVM requires its own heap allocation.
CPU: Additional CPU usage from parallel process execution. The number of concurrent threads or JVMs directly affects CPU demand.
Disk: No additional disk usage.

Process Call step

The Process Call step invokes a child process from within the current process.

Threads: A new child execution task is created. If Wait for document is enabled, an additional thread is not created; the parent execution blocks until the child completes.
Heap: Additional heap is required to support multiple process threads running at the same time when waiting is not enabled.
CPU: Additional CPU is required to support multiple process threads running concurrently.
Disk: No additional disk usage.

Process Route step

The Process Route step routes documents to one or more sub-processes based on a routing condition.

Threads: A new execution is created for each process route path that has documents to process. If Wait for document is enabled, an additional thread is not created per path.
Heap: Additional heap is required to support the potential for multiple process threads executing at the same time.
CPU: Additional CPU is required to support concurrent process thread execution.
Disk: No additional disk usage.